perf: replace OFFSET pagination with ID-range batching in async jobs#148
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces ID-range based keyset pagination to replace inefficient OFFSET-based pagination for asynchronous job dispatching. It adds a new utility, compute_id_ranges, which utilizes PostgreSQL's NTILE function to calculate batch boundaries, and updates the get_beneficiaries methods and various managers in the spp_programs module to support min_id and max_id parameters. Feedback was provided regarding a potential race condition in the pagination utility that could lead to a TypeError if records are deleted between database queries.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 19.0 #148 +/- ##
==========================================
+ Coverage 71.24% 71.35% +0.11%
==========================================
Files 931 932 +1
Lines 54742 54771 +29
==========================================
+ Hits 38999 39082 +83
+ Misses 15743 15689 -54
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
ffb1394 to
b4b9ddc
Compare
There was a problem hiding this comment.
@kneckinator is this really not intended to be added to the init.py?
OFFSET N causes PostgreSQL to scan and discard N rows, making later batches progressively slower. This replaces all async job dispatchers with NTILE-based ID-range batching that uses WHERE id BETWEEN min_id AND max_id, which is O(1) via the primary key index.
Handle the unlikely case where records are deleted between the COUNT and MIN/MAX queries, which would cause a TypeError on None values.
Cover the _check_eligibility_async, _prepare_entitlements_async, and the isinstance(states, str) branch in _enroll_eligible_registrants_async.
b4b9ddc to
b805be1
Compare
Summary
WHERE id BETWEEN min_id AND max_id, which is O(1) via the primary key index regardless of batch positionmin_id/max_idsupport toget_beneficiaries()on bothspp.programandspp.cycleChanges
pagination_utils.py(new):compute_id_ranges()helper using PostgreSQL NTILE window function to pre-compute (min_id, max_id) boundaries in a single SQL queryprogram_manager.py:_enroll_eligible_registrants_async()dispatches jobs with ID ranges instead of offset/limitcycle_manager_base.py:_check_eligibility_async()and_prepare_entitlements_async()use ID rangescycle_manager.py: UpdatedCustomDefaultCycleManager._prepare_entitlements()override to forwardmin_id/max_idprograms.py/cycle.py:get_beneficiaries()supportsmin_id/max_idfor range queriestest_keyset_pagination.py(new): 15 tests covering the helper, get_beneficiaries range queries, and async dispatch integrationContext
Phase 6 of 9 in the
spp_programsperformance optimization effort. Rebased on current 19.0. Version bumped to 19.0.2.0.8.Test plan
./scripts/test_single_module.sh spp_programs— 606 tests, 0 failurespre-commit run --files <changed_files>— all checks passODOO_INIT_MODULES=spp_programs docker compose --profile ui up -d